Goto

Collaborating Authors

 posterior mean





Debiased Bayesian inference for average treatment effects

Kolyan Ray, Botond Szabo

Neural Information Processing Systems

Workinginthestandard potential outcomes framework, we propose a data-driven modification to an arbitrary (nonparametric) prior based on the propensity score that corrects for the first-orderposteriorbias,therebyimprovingperformance.Weillustrateourmethod for Gaussian process (GP) priors using (semi-)synthetic data.


A Proofs

Neural Information Processing Systems

This appendix contains the proofs of the results found in Section 4. We start by introducing a useful The claim follows then directly from (4) and the definition of mutual information.Lemma 2. We then can compute the derivative and ask under which conditions it is non negative. The function b defined in (18) is monotonically increasing for positive arguments. Finally, let us fix ε > 0. Combining Lemmas 7 and 8, we obtain: b( σ The following result makes this statement precise. The following lemma makes this statement precise. In this Appendix, we collect details about the experiment presented in Section 6. Code for the used acquisition functions can be found at ISE selects the next parameter to evaluate according to (6), which is a non convex optimization problem constrained in one of the variables.


Computation-Aware Gaussian Processes: Model Selection And Linear-Time Inference Jonathan Wenger 1 Kaiwen Wu

Neural Information Processing Systems

Model selection in Gaussian processes scales prohibitively with the size of the training dataset, both in time and memory. While many approximations exist, all incur inevitable approximation error. Recent work accounts for this error in the form of computational uncertainty, which enables--at the cost of quadratic complexity--an explicit tradeoff between computational efficiency and precision. Here we extend this development to model selection, which requires significant enhancements to the existing approach, including linear-time scaling in the size of the dataset. We propose a novel training loss for hyperparameter optimization and demonstrate empirically that the resulting method can outperform SGPR, CGGP and SVGP, state-of-the-art methods for GP model selection, on medium to large-scale datasets. Our experiments show that model selection for computation-aware GPs trained on 1.8 million data points can be done within a few hours on a single GPU. As a result of this work, Gaussian processes can be trained on large-scale datasets without significantly compromising their ability to quantify uncertainty-- a fundamental prerequisite for optimal decision-making.



e8f2779682fd11fa2067beffc27a9192-Supplemental.pdf

Neural Information Processing Systems

In this analysis, we assume that evaluating the GP prior mean and kernel functions (and the corresponding derivatives) takesO(1)time. For each fantasy model, we need to compute the posterior mean and covariance matrix for the L points (x,w1:L), on which we draw the sample paths. This results in a total cost ofO(KML2)to generate all samples. The SAA approach trades a stochastic optimization problem with a deterministic approximation, which can be efficiently optimized. Suppose that we are interested in the optimization problemminxEω[h(x,ω)].